Unifying the analysis of continuous and categorical measures of weight loss and incorporating group effect: a secondary re-analysis of a large cluster randomized clinical trial using Bayesian approach

Background Although frequentist paradigm has been the predominant approach to clinical studies for decades, some limitations associated with the frequentist null hypothesis significance testing have been recognized. Bayesian approaches can provide additional insights into data interpretation and inference by deriving posterior distributions of model parameters reflecting the clinical interest. In this article, we sought to demonstrate how Bayesian approaches can improve the data interpretation by reanalyzing the Rural Engagement in Primary Care for Optimizing Weight Reduction (REPOWER). Methods REPOWER is a cluster randomized clinical trial comparing three care delivery models: in-clinic individual visits, in-clinic group visits, and phone-based group visits. The primary endpoint was weight loss at 24 months and the secondary endpoints included the proportions of achieving 5 and 10% weight loss at 24 months. We reanalyzed the data using a three-level Bayesian hierarchical model. The posterior distributions of weight loss at 24 months for each arm were obtained using Hamiltonian Monte Carlo. We then estimated the probability of having a higher weight loss and the probability of having greater proportion achieving 5 and 10% weight loss between groups. Additionally, a four-level hierarchical model was used to assess the partially nested intervention group effect which was not investigated in the original REPOWER analyses. Results The Bayesian analyses estimated 99.5% probability that in-clinic group visits, compared with in-clinic individual visits, resulted in a higher percent weight loss (posterior mean difference: 1.8%[95% CrI: 0.5,3.2%]), a greater probability of achieving 5% threshold (posterior mean difference: 9.2% [95% CrI: 2.4, 16.0%]) and 10% threshold (posterior mean difference: 6.6% [95% CrI: 1.7, 11.5%]). The phone-based group visits had similar result. We also concluded that including intervention group did not impact model fit significantly. Conclusions We unified the analyses of continuous (the primary endpoint) and categorical measures (the secondary endpoints) of weight loss with one single Bayesian hierarchical model. This approach gained statistical power for the dichotomized endpoints by leveraging the information in the continuous data. Furthermore, the Bayesian analysis enabled additional insights into data interpretation and inference by providing posterior distributions for parameters of interest and posterior probabilities of different hypotheses that were not available with the frequentist approach. Trial registration ClinicalTrials.gov Identifier NCT02456636; date of registry: May 28, 2015. Supplementary Information The online version contains supplementary material available at 10.1186/s12874-021-01499-0.


Introduction
Although frequentist paradigm has been the predominant approach to clinical studies in the past several decades and we have seen tremendous progress in medicine, some limitations associated with the frequentist null hypothesis significance testing (NHST) that reports dichotomized p values have been recognized in statistic society [1,2]. One of the important problems with NHST is that p values are very prone to misinterpretation and are often misused in medical studies [3]. The most common misinterpretation of p values is the probability of the null hypothesis. Frequentist methods do not estimate the probability of hypotheses and a p value is the probability of observing data as extreme or more extreme if the null hypothesis is true (no treatment effect), which may not be of the researcher's interest. Additionally, p values are routinely dichotomized using a predefined α level (usually 0.05) to facilitate medical decision-making. A nonsignificant p value (> 0.05) is sometimes misinterpreted as 'no effect' while a nonsignificant result does not distinguish between a true null effect and a lack of statistic power [4]. When the sample size is small or when the variation is big, p values can be big even when there is a true effect. Bayesian approaches, on the other hand, can provide additional in-depth insights into data interpretation by deriving posterior distributions of model parameters reflecting clinical interests. The probabilities of different hypotheses can be estimated from the posterior distributions of model parameters, e.g., the probability of treatment A better than treatment B, or the probability of treatment A equivalent to treatment B, etc. This allows one to make probabilistic interpretations according to the entire posterior distributions. Furthermore, Bayesian approaches are also extremely flexible in that the posterior distributions can be converted to metrics of clinical interests without having to use extra modeling. In this article, we focused on demonstrating how Bayesian approaches can improve interpretation by reanalyzing the REPOWER [5] data using Bayesian models. We aim to accomplish three goals for weight loss clinical trials: (1) encourage posterior probabilities for interpretation; (2) harmonize clinical weight loss metrics for percent weight loss (continuous) and achievement of weight loss clinical thresholds (binary); and (3) model the clustering of the partially nested intervention group effect common in weight loss studies but ignored in the original REPOWER paper.
Obesity is a chronic condition affecting an increasing number of Americans with the prevalence reaching 42% in 2017-2018 [6]. It is a serious health risk and is associated with a wide range of morbidities [7]. The Centers for Medicare and Medicaid Services (CMS) approved to cover Intensive Behavioral Therapy for Obesity (IBT) with up to 22 individual 15-min face-to-face visits over a 12-month period in 2011 [8]. The CMS employs a fee-for-service delivery model which has been challenged and questioned. A variety of care delivery models have arisen in addition to the traditional face-to-face office visit. REPOWER [5] is a cluster randomized clinical trial comparing the fee-forservice individual delivery model to two alternatives: in-clinic group visits and phone-based group visits. Participant weight was measured at baseline, 6, 18, and 24 months by trained staff. The primary endpoint was weight loss at 24 months. The secondary endpoints included the proportions of participants achieved 5 and 10% weight loss at 24 months.
In the original analyses [5], frequentist methods were used and inferences were drawn based on p values and confidence intervals. For the primary endpoint, a linear mixed model was used. The in-clinic group visits, but not the phone-based group, resulted in a statistically significantly higher weight loss at 24 months when compared with the in-clinic individual visits. For the secondary endpoints, two separate mixed effect logistic models were used to compare the proportions of participants of achieving 5 and 10% weight loss at 24 months. None of the comparisons resulted in a significant p value. In this article, we reanalyzed the percent weight loss over time using a Bayesian hierarchical model with noninformative priors. We first obtained the posterior distributions of weight loss at 24 months for each arm using Hamiltonian Monte Carlo. We then estimated the probabilities of having a greater weight loss in the inclinic group visits and the phone-based group visits vs. the in-clinic individual visits. With the same model, we also obtained the posterior distributions for the probabilities of achieving 5% (or 10%) weight loss in each arm and the probabilities of having greater probabilities of achieving the weight loss thresholds in the two group-based arms vs. the in-clinic individual visits. The enabled additional insights into data interpretation and inference by providing posterior distributions for parameters of interest and posterior probabilities of different hypotheses that were not available with the frequentist approach. Bayesian approach not only provided a better interpretation by reporting probabilities of different hypotheses, but also unified the analyses of the continuous (the primary endpoint) and categorical measures of weight loss (the secondary endpoints) using a single model. This approach resulted in consistent inferences for different endpoints and achieved higher power for the secondary endpoints in comparison with the original analyses. Moreover, the original analyses took into consideration the clustering of sites but ignored the clustering of intervention group in the two group-based arms. Intervention group was partially nested because it was relevant to the two group-based arms only. The Bayesian approach can easily handle complex problems using the same statistical framework. We used a four-level hierarchical model with an additional level to assess the partially nested group assignment on the effect of delivery models.

Study design and data structure
REPOWER is a cluster randomized clinical trial with thirty six primary practices from three affiliations (academic medical centers that recruited participants for the study: the University of Kansas Medical Center (KUMC), the University of Nebraska Medical Center (UNMC), and the Marshfield Clinic in Wisconsin (Marshfield clinic)) randomly assigned to one of the three study arms in equal numbers: 1) in-clinic individual visits in which the participants received 15-min face-to-face individual counseling sections; 2) in-clinic group visits in which the participants received group visits held at practices with a median of 14 participants per group; 3) Phone-based group visits in which participants received lifestyle intervention delivered remotely via audio-only conference calls with a median of 14 participants per group. The trial was approved by institutional review boards at the University of Kansa Medical Center and the VA Nebraska-Western Iowa Health Care System. All participants provided written informed consent. The re-analysis was done on deidentified data. 1407 participants were included in the final analysis. Weight was measure at baseline, 6, 18, and 24 months by trained staff. The primary outcome was weight loss at 24 months. The secondary outcomes included the proportions of achieving 5 and 10% weight loss at 24 months. The detailed information about the trial conduction has been published by Befort et al. [5]. In this article, we first analyzed the percent weight loss using a three-level Bayesian hierarchical model to compare the effect of different intervention delivery models on percent weight loss. A second Bayesian hierarchical model additionally included intervention group as a partially nested effect to assess its effect on weight loss.

Model 1: three level Bayesian hierarchical model for percent weight loss
Let y ijt be the percent weight loss for participant j from site i at time t. x 1 and x 2 are the arm indicators: (0,0) for in-clinic individual visits, (1,0) for in-clinic group visits, and (0,1) for phone-based group visits. t 18 and t 24 are the time indicators: (0,0) for month 6, (1,0) for month 18, and (0,1) for month24. We also include arm and time interactions so that delivery model effect can be evaluated at each time point. To be consistent with the original analyses, we included affiliation indicators as covariates (denoted by x 3 and x 4 ). The three-level Bayesian hierarchical model can be represented as follows.
η is site level variation and a 000 is the model intercept.
Noninformative priors were used to make like to like comparison with the frequentist analyses: Stan default flat prior, uniform distribution on the real line, was used for a 000 and βs; truncated normal distribution N + (0, 10) was used for the standard deviations (σ, σ γ , and σ η ) to ensure only positive values were allowed.

Model 2: Bayesian hierarchical model for percent weight loss with group assignment as a partially nested effect
Participants in the in-clinic group visits arm and the phone-based group visits arm received the interventions in groups. We wanted to examine the impact of group assignment on the effect of intervention delivery methods for the two group-based arms, which was not tackled in the original analyses. In model 2, we utilized a fourlevel hierarchical Bayesian model with the group assignment as a partially nested effect to assess the effect of intervention group.
Let k > 0 index the intervention group for participants in the two group-based arms. For participants in the inclinic individual visits arm, k = 0. The four-level Bayesian hierarchical model can be represented as follows.
represents the intervention group level variation for participants in the two group-based arms and for participants in the in-clinic individual arm ϑ 0 = 0. • α 0i00 = α 0000 + η i , η i ~ N 0, σ 2 η represents the site level variation and a 0000 is the intercept.
• ϵ ikjt~N (0, σ 2 ) is the within patient residual error The same noninformative priors as in Model 1 were used. To assess whether including intervention group as an additional hierarchical level improved model fit, we used two model selection methods to compare Model 1 and Model 2: leave-one-out cross-validation (Loo-CV) and widely available information criterion (WAIC) [9]. Both methods are implemented in the loo R package [10].

Computation and software
Hamiltonian Monte Carlo [11] was performed in Stan [12] to obtain the posterior distributions for parameters of interest. Figure representations of posterior distributions were computed from gaussian kernel density estimates, which provided a smoothed version of the sampled histograms. R package Rstan was used as the interface to call Stan code [13]. All the other analyses and plots were conducted in R. The Stan code for the two models can be found in the Additional file 1.

Model convergence assessment and predictive checking
For both models we ran four parallel MCMC chains with starting points randomly generated from the prior distributions. For each chain, we allowed 3000 iterations for the sampler to converge and another 3000 for sampling the posterior distributions. Convergence was checked visually utilizing trace plots. We also checked the potential scale reduction factor [14] and the effective sample size. For all model parameters, R was less than 1.01 and effective sample size was > 400. Table 1 summarizes the model parameters using posterior means and 95% credible intervals (CrI, calculated by taking the 2.5 and 97.5 percentiles of the posterior distributions) based on their MCMC samples of the posterior distributions. Because non-informative priors were used, the means and 95% CrIs were very close to  .0] respectively for achieving 10% threshold. The shaded areas (to the right of zero) represent the probabilities of having a higher probability of achieving the thresholds. For both 5 and 10% weight loss, the probabilities were 99.5% for in-clinic group arm and 98.2% for the phonebased group arm and they were consistent with the probabilities of having a greater weight loss than the in-clinic individual visits arm as shown in Fig. 1B. In the original   current Bayesian analysis reported that the probability of with a greater weight loss in the in-clinic group visits and phone-based group visits were 99.5 and 98.2% respectively, from which we concluded that both group-based arms were superior than the in-clinic individual visits with high confidence. For the secondary endpoints, the original analyses used two separate mixed effect logistic regressions to compare the odds of achieving 5 and 10% weight loss.

Conclusion and discussion
Studies have shown that dichotomizing continuous endpoints results in a loss of information and reduced power [15][16][17]. The current Bayesian analysis assessed the probabilities achieving 5 and 10% weight loss by integrating the posterior predictive distributions of the weight loss and reported 99.5 and 98.2% respectively while the original analyses reported there were no significant differences across the board. Furthermore, the Bayesian analysis also provided the absolute differences in probabilities of achieving 5 and 10% weight loss in the in-clinic group visits and phone-based group visits vs. the in-clinic individual visits, which may be preferred by clinicians than odds ratios reported in the original analysis.
In the Quantities of interest section, we used arithmetic average across affiliations to obtain the average expected percent weight loss for each arm. This method gives each affiliation the same weight. There are other choices for the averaging weights, e.g., weights that are proportionate to the numbers of participants or the numbers of sites in each affiliation. The method to use should be determined by the inference one intends to make. For the current study, the primary goal was to compare the three treatment arms. When the proportions of patients in each affiliation are similar across the three arms, the method would not affect the conclusion because β 9 and β 10 will be cancelled out when we take the difference between arms. Therefore, we would reach the same conclusion if we use different weights that are proportionate to the numbers of participants in each affiliation. Besides the advantages we discussed in this study, Bayesian approaches have other strengths including the ability to incorporate previous evidence through prior distributions to inform the posterior distributions and the ability to update the posterior distributions when new evidences emerge. Bayesian approaches have gained popularity in recent years owing to the advancement in powerful computing capacity and the invention of efficient Bayesian statistical software. However, Bayesian approaches remain underused and are often used as secondary re-analyses. We hope to see Bayesian approaches being adopted more frequently as primary analysis in clinical studies.

Additional file 1.
Acknowledgements R package brms was used in preparing data and generating STAN code.

Authors' contributions
FT conducted the analyses and wrote the manuscript. BG directed and supervised the project. CB and JW discussed the results and commented on the manuscript. All authors reviewed the manuscript. The author(s) read and approved the final manuscript.

Funding
Research reported in this article was funded through Patient-Centered Outcomes Research Institute (PCORI) award OTO-1402-09413 as well as by The University of Kansas Cancer Center Support Grant (CCSG) awarded by the National Cancer Institute (P30 CA168524).

Availability of data and materials
Data will be made available upon approved requests sent to cbefo rt@ kumc. edu.

Declarations
Ethics approval and consent to participate All subjects provided written informed consent in the parent trial. The reanalysis was done on deidentified data. See Befort et al. (2021) for details. The